{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 解决大模型数据新鲜度低的问题\n",
    "* 导入文本\n",
    "* 使用搜索引擎/外部工具/API\n",
    "* 向量数据库"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import openai\n",
    "import os\n",
    "\n",
    "\n",
    "openai.api_key = os.getenv('MY_API_KEY')\n",
    "openai.api_base = os.getenv('MY_API_BASE')\n",
    "\n",
    "chat_history = []"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 定义辅助函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def user_message(msg):\n",
    "    item = {\"role\": \"user\", \"content\": msg}\n",
    "    chat_history.append(item)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def assistant_message(msg):\n",
    "    item = {\"role\": \"assistant\", \"content\": msg}\n",
    "    chat_history.append(item)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def system_message(msg):\n",
    "    item = {\"role\": \"system\", \"content\": msg}\n",
    "    chat_history.append(item)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def clear_history():\n",
    "    chat_history.clear()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "def dump_chat_history():\n",
    "    print(json.dumps(chat_history,ensure_ascii=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 辅助函数\n",
    "def predict(model = 'gpt-3.5-turbo'):\n",
    "    response = openai.ChatCompletion.create(\n",
    "        model = model,\n",
    "        messages = chat_history,\n",
    "        user = \"llm_cource2\",\n",
    "        # 是一个介于 0 ~ 1 之间的数，数值越大，代表生成的结果越不一致/或者稳定\n",
    "        temperature = 0,\n",
    "    )\n",
    "    assistant_message(response.choices[0].message['content'])\n",
    "    return response.choices[0].message['content']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 导入文本"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入txt文本\n",
    "def load_knowledge_from_txt(path):\n",
    "    with open(path,'r',encoding='utf8') as file:\n",
    "        content = file.read()\n",
    "\n",
    "        file.close()\n",
    "    return content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "load_knowledge_from_txt(\"./example.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install PyPDF2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import PyPDF2\n",
    "\n",
    "# 导入pdf\n",
    "def load_knowledge_from_pdf(path):\n",
    "    with open(path,'rb') as file:\n",
    "        pdf_reader = PyPDF2.PdfReader(file)\n",
    "\n",
    "        total_pages = len(pdf_reader.pages)\n",
    "\n",
    "        content = \"\"\n",
    "\n",
    "        for page_index in range(total_pages):\n",
    "            page = pdf_reader.pages[page_index]\n",
    "            text = page.extract_text()\n",
    "            content += text\n",
    "\n",
    "        file.close()\n",
    "    return content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "load_knowledge_from_pdf(\"./example.pdf\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clear_history()\n",
    "user_message(\"Sam Altman 为什么被董事会罢免？\")\n",
    "predict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clear_history()\n",
    "\n",
    "knowledge = load_knowledge_from_txt(\"./example.txt\")\n",
    "\n",
    "prompt = f\"\"\"\n",
    "你现在是一个问答助手，你需要优先根据以下的知识回答用户问题，如果以下提供的知识不足以回答用户问题，你可以根据自己的理解回答。\n",
    "```\n",
    "{knowledge}\n",
    "```\n",
    "\"\"\"\n",
    "system_message(prompt)\n",
    "user_message(\"Sam Altman 为什么被董事会罢免？\")\n",
    "predict()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dump_chat_history()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clear_history()\n",
    "\n",
    "knowledge = load_knowledge_from_pdf(\"./example.pdf\")\n",
    "\n",
    "prompt = f\"\"\"\n",
    "你现在是一个问答助手，你需要优先根据以下的知识回答用户问题，如果以下提供的知识不足以回答用户问题，你可以根据自己的理解回答。\n",
    "```\n",
    "{knowledge}\n",
    "```\n",
    "\"\"\"\n",
    "system_message(prompt)\n",
    "user_message(\"除了Sam Altman，还有谁离开了openai\")\n",
    "predict()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 问题\n",
    "* 如果文本太长，超出了chatgpt的token限制怎么办？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使用搜索引擎"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "def search_with_bing(keyword):\n",
    "    headers = {\"Ocp-Apim-Subscription-Key\": os.getenv('BING_API_KEY')}\n",
    "    params = {\"q\": keyword, 'mtk':'zh_CN','count':15}\n",
    "    response = requests.get('https://api.bing.microsoft.com/v7.0/search', headers=headers, params=params)\n",
    "    response.raise_for_status()\n",
    "    search_results = response.json()\n",
    "    result = \"\"\n",
    "    for item in response.json()['webPages']['value']:\n",
    "        result += item['snippet']\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "search_with_bing(\"北京今天天气\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_message(\"北京今天天气怎么样？\")\n",
    "predict()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "\n",
    "clear_history()\n",
    "\n",
    "prompt = f\"\"\"\n",
    "你现在是一个AI助手，你需要耐心解答用户问题，如果你不知道，你可以输出'[search('keyword')]'，其中'keyword'是对用户问题的总结，总结需要尽量简洁且对搜索引擎友好，我会使用搜索引擎来协助你回答用户问题。\n",
    "下面是一个服务例子：\n",
    "user: 今天天气如何？\n",
    "assitant: [search('今日天气')]\n",
    "\"\"\"\n",
    "system_message(prompt)\n",
    "user_message(\"北京今天天气怎么样？\")\n",
    "response = predict()\n",
    "\n",
    "pattern = r\"\\[search\\('(.+?)'\\)\\]\"\n",
    "\n",
    "match = re.search(pattern,response)\n",
    "if match:\n",
    "    keyword = match.group(1)\n",
    "    print(keyword)\n",
    "    search_result = search_with_bing(keyword)\n",
    "    system_message(search_result)\n",
    "    response = predict()\n",
    "    print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 向量数据库\n",
    "使用数据库就是更加方便的管理你的私有知识库\n",
    "这里使用chromadb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip3 install chromadb"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义几个数据库辅助函数\n",
    "* 新建数据集\n",
    "* 插入数据\n",
    "* 检索数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import chromadb\n",
    "from chromadb.utils import embedding_functions\n",
    "\n",
    "client = chromadb.Client()\n",
    "openai_ef = embedding_functions.OpenAIEmbeddingFunction(api_key = os.getenv('MY_API_KEY'),api_base = os.getenv('MY_API_BASE'),model_name=\"text-embedding-ada-002\")\n",
    "client.delete_collection('my_collection')\n",
    "colleciton = client.create_collection('my_collection',embedding_function=openai_ef)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 插入数据\n",
    "import hashlib\n",
    "def insertion(doc):\n",
    "    hash = hashlib.md5(doc.encode('utf8')).hexdigest()\n",
    "    colleciton.add(\n",
    "        documents=[doc],\n",
    "        metadatas=[{'md5':hash}],\n",
    "        ids=[hash]\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 检索数据\n",
    "def query_from_vec_db(keyword):\n",
    "    results = colleciton.query(\n",
    "        query_texts=[keyword],\n",
    "        n_results=10\n",
    "    )\n",
    "    print(results)\n",
    "    return results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "插入几条数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Add of existing embedding ID: 23669b5430f11d6766a933fc8c39f04e\n",
      "Insert of existing embedding ID: 23669b5430f11d6766a933fc8c39f04e\n",
      "Add of existing embedding ID: 117bb3155c55e8c047eac2991f223a9d\n",
      "Insert of existing embedding ID: 117bb3155c55e8c047eac2991f223a9d\n",
      "Add of existing embedding ID: f138d4c353d3a128df6dc59503cec778\n",
      "Insert of existing embedding ID: f138d4c353d3a128df6dc59503cec778\n",
      "Add of existing embedding ID: ab9a70148ae795335a209ca3a779f935\n",
      "Insert of existing embedding ID: ab9a70148ae795335a209ca3a779f935\n",
      "Add of existing embedding ID: a480b3d721c27662ec801eb2d689f688\n",
      "Insert of existing embedding ID: a480b3d721c27662ec801eb2d689f688\n",
      "Add of existing embedding ID: 542cef2a6ac6b2d3825bb59a85ac4d2a\n",
      "Insert of existing embedding ID: 542cef2a6ac6b2d3825bb59a85ac4d2a\n",
      "Add of existing embedding ID: a83ff3a688edb373268c7f6a4535658f\n",
      "Insert of existing embedding ID: a83ff3a688edb373268c7f6a4535658f\n",
      "Add of existing embedding ID: 1327ac8938fda398d07081c97197790d\n",
      "Insert of existing embedding ID: 1327ac8938fda398d07081c97197790d\n",
      "Add of existing embedding ID: 722d2b10d62c81e52341fddc66bfc7a0\n",
      "Insert of existing embedding ID: 722d2b10d62c81e52341fddc66bfc7a0\n",
      "Add of existing embedding ID: 2b6d0f4f95c40f20c5218e6bc135e015\n",
      "Insert of existing embedding ID: 2b6d0f4f95c40f20c5218e6bc135e015\n",
      "Add of existing embedding ID: 5c363d05c69073052e634205512c222f\n",
      "Insert of existing embedding ID: 5c363d05c69073052e634205512c222f\n",
      "Add of existing embedding ID: 24a1ac6221b0c7539f6cdd1056f7bcf1\n",
      "Insert of existing embedding ID: 24a1ac6221b0c7539f6cdd1056f7bcf1\n",
      "Add of existing embedding ID: fc5e3b3dff275d9407215790714a0515\n",
      "Insert of existing embedding ID: fc5e3b3dff275d9407215790714a0515\n",
      "Add of existing embedding ID: 86a282f8dd947fa79d65685e4ee678b0\n",
      "Insert of existing embedding ID: 86a282f8dd947fa79d65685e4ee678b0\n",
      "Add of existing embedding ID: ec410bded45fa5ca95d5816430f3366a\n",
      "Insert of existing embedding ID: ec410bded45fa5ca95d5816430f3366a\n",
      "Add of existing embedding ID: d81638dc38b0f9b935afa542b1b546f1\n",
      "Insert of existing embedding ID: d81638dc38b0f9b935afa542b1b546f1\n",
      "Add of existing embedding ID: 4af94c16ea408031808158263f691d0b\n",
      "Insert of existing embedding ID: 4af94c16ea408031808158263f691d0b\n",
      "Add of existing embedding ID: 64f966f1121f6f9ca9ce063c45ed041c\n",
      "Insert of existing embedding ID: 64f966f1121f6f9ca9ce063c45ed041c\n",
      "Add of existing embedding ID: 31caee0ad1aa8b854a0a53e1dca116d0\n",
      "Insert of existing embedding ID: 31caee0ad1aa8b854a0a53e1dca116d0\n",
      "Add of existing embedding ID: afbf1d4740f869f6df4160bae2b56599\n",
      "Insert of existing embedding ID: afbf1d4740f869f6df4160bae2b56599\n",
      "Add of existing embedding ID: 2b7ad9e9ef2a3b21af97b5490854034e\n",
      "Insert of existing embedding ID: 2b7ad9e9ef2a3b21af97b5490854034e\n",
      "Add of existing embedding ID: 370ad0a4f00f2d76c62260f103fd00db\n",
      "Insert of existing embedding ID: 370ad0a4f00f2d76c62260f103fd00db\n",
      "Add of existing embedding ID: 734c1ad014cf0d1f247db1018e0de4e9\n",
      "Insert of existing embedding ID: 734c1ad014cf0d1f247db1018e0de4e9\n",
      "Add of existing embedding ID: c1ed2c2bda4c0c58778fed9a4d60a81a\n",
      "Insert of existing embedding ID: c1ed2c2bda4c0c58778fed9a4d60a81a\n",
      "Add of existing embedding ID: aec28668f3c3c8475e47c72832b63f24\n",
      "Insert of existing embedding ID: aec28668f3c3c8475e47c72832b63f24\n",
      "Add of existing embedding ID: 619d5a648fd52bbbd30cf1584c97cad5\n",
      "Insert of existing embedding ID: 619d5a648fd52bbbd30cf1584c97cad5\n",
      "Add of existing embedding ID: 1e1be4619d822b1314fffa4d0756476a\n",
      "Insert of existing embedding ID: 1e1be4619d822b1314fffa4d0756476a\n",
      "Add of existing embedding ID: a815e28ff9a0fd0072a0644f12a8a148\n",
      "Insert of existing embedding ID: a815e28ff9a0fd0072a0644f12a8a148\n",
      "Add of existing embedding ID: e58cca1e4c8a41dd649c56fc2633e441\n",
      "Insert of existing embedding ID: e58cca1e4c8a41dd649c56fc2633e441\n",
      "Add of existing embedding ID: 2a44ed9970e5cb24b61b03ec92eada5c\n",
      "Insert of existing embedding ID: 2a44ed9970e5cb24b61b03ec92eada5c\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "content = load_knowledge_from_txt(\"./example.txt\")\n",
    "docs = content.split(\"。\")\n",
    "for doc in docs:\n",
    "    insertion(doc)\n",
    "    time.sleep(10)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'ids': [['117bb3155c55e8c047eac2991f223a9d', '64f966f1121f6f9ca9ce063c45ed041c', 'e58cca1e4c8a41dd649c56fc2633e441', 'f138d4c353d3a128df6dc59503cec778', '31caee0ad1aa8b854a0a53e1dca116d0', '370ad0a4f00f2d76c62260f103fd00db', 'd9f3952b0c2d8dd486aa9dd83ae8ce99', 'afbf1d4740f869f6df4160bae2b56599', 'd81638dc38b0f9b935afa542b1b546f1', 'fc5e3b3dff275d9407215790714a0515']], 'distances': [[0.2700326144695282, 0.270065575838089, 0.3001212179660797, 0.3281254470348358, 0.3289717435836792, 0.3323637545108795, 0.33528652787208557, 0.337248831987381, 0.3567003011703491, 0.3608716130256653]], 'metadatas': [[{'md5': '117bb3155c55e8c047eac2991f223a9d'}, {'md5': '64f966f1121f6f9ca9ce063c45ed041c'}, {'md5': 'e58cca1e4c8a41dd649c56fc2633e441'}, {'md5': 'f138d4c353d3a128df6dc59503cec778'}, {'md5': '31caee0ad1aa8b854a0a53e1dca116d0'}, {'md5': '370ad0a4f00f2d76c62260f103fd00db'}, {'md5': 'd9f3952b0c2d8dd486aa9dd83ae8ce99'}, {'md5': 'afbf1d4740f869f6df4160bae2b56599'}, {'md5': 'd81638dc38b0f9b935afa542b1b546f1'}, {'md5': 'fc5e3b3dff275d9407215790714a0515'}]], 'embeddings': None, 'documents': [['\\n关于离职原因，OpenAI 方面给出的理由是：Sam Altman 先生的离职是在董事会经过审议后得出的结论，他在与董事会的沟通中始终不坦诚，阻碍了董事会履行职责的能力', '\\n总结来看，自愿放弃创始股份的联合创始人兼 CEO Sam Altman 被 拥有股份的联合创始人兼首席科学家 Ilya Sutskever 和 3 个外部独立董事构成的董事会解除 CEO 职务', '\\n虽然不知道这是否暗示了 Sam Altman 与董事会之间存在矛盾，但是 Sam Altman 的突然离开确实让外界感到震惊，包括 OpenAI 的员工都是在看到公告后才得知这一消息的，未来这位“传奇天才”是否会开创一家新的公司，甚至 与 OpenAI 竞争都不得而知了', '董事会不再对他继续领导 OpenAI 的能力充满信心\\nSam Altman 也通过个人社交网站回应了离职一事儿：\\n关于临时继任者 Mira，其 在 OpenAI 领导团队任职五年，在 OpenAI 发展成为全球人工智能领导者的过程中发挥了关键作用', '另一联合创始人兼总裁 Greg Brockman ，卸任董事会主席', '\\n有参加了此次交流会的开发者表示，因为这是闭门交流会，所以 Altman 在交谈中表现出了开放的心态，讨论内容既涉及开发者面临的实际问题，也延伸到了商业竞争、AI 监管和开源等问题', '有人指出，GPT 商店和收入分享是 Sam Altman 做的单方面商业决定，追求利润，而非一个负责任的非营利组织原则，威胁到了企业的核心价值', '\\n与董事会之间存在矛盾，是否早有端倪？\\n今年 6 月份，OpenAI 创始人 Sam Altman 与 Humanloop CEO Raza Habib 以及其他 20 位开发者面对面进行了一场闭门交流，交流中他们讨论了 OpenAI 的近况与未来的规划', '董事会大部分成员都是独立的，独立董事不持有 OpenAI 股权', '\\n作为此次过渡的一部分，格雷格·布罗克曼 (Greg Brockman) 将辞去董事会主席职务，并继续担任公司职务，向首席执行官汇报']]}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'ids': [['117bb3155c55e8c047eac2991f223a9d',\n",
       "   '64f966f1121f6f9ca9ce063c45ed041c',\n",
       "   'e58cca1e4c8a41dd649c56fc2633e441',\n",
       "   'f138d4c353d3a128df6dc59503cec778',\n",
       "   '31caee0ad1aa8b854a0a53e1dca116d0',\n",
       "   '370ad0a4f00f2d76c62260f103fd00db',\n",
       "   'd9f3952b0c2d8dd486aa9dd83ae8ce99',\n",
       "   'afbf1d4740f869f6df4160bae2b56599',\n",
       "   'd81638dc38b0f9b935afa542b1b546f1',\n",
       "   'fc5e3b3dff275d9407215790714a0515']],\n",
       " 'distances': [[0.2700326144695282,\n",
       "   0.270065575838089,\n",
       "   0.3001212179660797,\n",
       "   0.3281254470348358,\n",
       "   0.3289717435836792,\n",
       "   0.3323637545108795,\n",
       "   0.33528652787208557,\n",
       "   0.337248831987381,\n",
       "   0.3567003011703491,\n",
       "   0.3608716130256653]],\n",
       " 'metadatas': [[{'md5': '117bb3155c55e8c047eac2991f223a9d'},\n",
       "   {'md5': '64f966f1121f6f9ca9ce063c45ed041c'},\n",
       "   {'md5': 'e58cca1e4c8a41dd649c56fc2633e441'},\n",
       "   {'md5': 'f138d4c353d3a128df6dc59503cec778'},\n",
       "   {'md5': '31caee0ad1aa8b854a0a53e1dca116d0'},\n",
       "   {'md5': '370ad0a4f00f2d76c62260f103fd00db'},\n",
       "   {'md5': 'd9f3952b0c2d8dd486aa9dd83ae8ce99'},\n",
       "   {'md5': 'afbf1d4740f869f6df4160bae2b56599'},\n",
       "   {'md5': 'd81638dc38b0f9b935afa542b1b546f1'},\n",
       "   {'md5': 'fc5e3b3dff275d9407215790714a0515'}]],\n",
       " 'embeddings': None,\n",
       " 'documents': [['\\n关于离职原因，OpenAI 方面给出的理由是：Sam Altman 先生的离职是在董事会经过审议后得出的结论，他在与董事会的沟通中始终不坦诚，阻碍了董事会履行职责的能力',\n",
       "   '\\n总结来看，自愿放弃创始股份的联合创始人兼 CEO Sam Altman 被 拥有股份的联合创始人兼首席科学家 Ilya Sutskever 和 3 个外部独立董事构成的董事会解除 CEO 职务',\n",
       "   '\\n虽然不知道这是否暗示了 Sam Altman 与董事会之间存在矛盾，但是 Sam Altman 的突然离开确实让外界感到震惊，包括 OpenAI 的员工都是在看到公告后才得知这一消息的，未来这位“传奇天才”是否会开创一家新的公司，甚至 与 OpenAI 竞争都不得而知了',\n",
       "   '董事会不再对他继续领导 OpenAI 的能力充满信心\\nSam Altman 也通过个人社交网站回应了离职一事儿：\\n关于临时继任者 Mira，其 在 OpenAI 领导团队任职五年，在 OpenAI 发展成为全球人工智能领导者的过程中发挥了关键作用',\n",
       "   '另一联合创始人兼总裁 Greg Brockman ，卸任董事会主席',\n",
       "   '\\n有参加了此次交流会的开发者表示，因为这是闭门交流会，所以 Altman 在交谈中表现出了开放的心态，讨论内容既涉及开发者面临的实际问题，也延伸到了商业竞争、AI 监管和开源等问题',\n",
       "   '有人指出，GPT 商店和收入分享是 Sam Altman 做的单方面商业决定，追求利润，而非一个负责任的非营利组织原则，威胁到了企业的核心价值',\n",
       "   '\\n与董事会之间存在矛盾，是否早有端倪？\\n今年 6 月份，OpenAI 创始人 Sam Altman 与 Humanloop CEO Raza Habib 以及其他 20 位开发者面对面进行了一场闭门交流，交流中他们讨论了 OpenAI 的近况与未来的规划',\n",
       "   '董事会大部分成员都是独立的，独立董事不持有 OpenAI 股权',\n",
       "   '\\n作为此次过渡的一部分，格雷格·布罗克曼 (Greg Brockman) 将辞去董事会主席职务，并继续担任公司职务，向首席执行官汇报']]}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query_from_vec_db(\"Sam Altman 为什么被董事会罢免？\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_similarest_result(results):\n",
    "    similar = \"\"\n",
    "    distance = 1\n",
    "    for idx in range(len(results['ids'][0])):\n",
    "        if results['distances'][0][idx] < distance:\n",
    "            distance = results['distances'][0][idx]\n",
    "            similar = results['documents'][0][idx]\n",
    "    return similar\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后就是例子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'ids': [['117bb3155c55e8c047eac2991f223a9d', '64f966f1121f6f9ca9ce063c45ed041c', 'e58cca1e4c8a41dd649c56fc2633e441', 'f138d4c353d3a128df6dc59503cec778', '370ad0a4f00f2d76c62260f103fd00db', 'd9f3952b0c2d8dd486aa9dd83ae8ce99', 'afbf1d4740f869f6df4160bae2b56599', '31caee0ad1aa8b854a0a53e1dca116d0', '23669b5430f11d6766a933fc8c39f04e', 'fc5e3b3dff275d9407215790714a0515']], 'distances': [[0.264531672000885, 0.2734794020652771, 0.2768864333629608, 0.30970245599746704, 0.3169591724872589, 0.32905107736587524, 0.3493810296058655, 0.352994441986084, 0.3698384761810303, 0.3870091140270233]], 'metadatas': [[{'md5': '117bb3155c55e8c047eac2991f223a9d'}, {'md5': '64f966f1121f6f9ca9ce063c45ed041c'}, {'md5': 'e58cca1e4c8a41dd649c56fc2633e441'}, {'md5': 'f138d4c353d3a128df6dc59503cec778'}, {'md5': '370ad0a4f00f2d76c62260f103fd00db'}, {'md5': 'd9f3952b0c2d8dd486aa9dd83ae8ce99'}, {'md5': 'afbf1d4740f869f6df4160bae2b56599'}, {'md5': '31caee0ad1aa8b854a0a53e1dca116d0'}, {'md5': '23669b5430f11d6766a933fc8c39f04e'}, {'md5': 'fc5e3b3dff275d9407215790714a0515'}]], 'embeddings': None, 'documents': [['\\n关于离职原因，OpenAI 方面给出的理由是：Sam Altman 先生的离职是在董事会经过审议后得出的结论，他在与董事会的沟通中始终不坦诚，阻碍了董事会履行职责的能力', '\\n总结来看，自愿放弃创始股份的联合创始人兼 CEO Sam Altman 被 拥有股份的联合创始人兼首席科学家 Ilya Sutskever 和 3 个外部独立董事构成的董事会解除 CEO 职务', '\\n虽然不知道这是否暗示了 Sam Altman 与董事会之间存在矛盾，但是 Sam Altman 的突然离开确实让外界感到震惊，包括 OpenAI 的员工都是在看到公告后才得知这一消息的，未来这位“传奇天才”是否会开创一家新的公司，甚至 与 OpenAI 竞争都不得而知了', '董事会不再对他继续领导 OpenAI 的能力充满信心\\nSam Altman 也通过个人社交网站回应了离职一事儿：\\n关于临时继任者 Mira，其 在 OpenAI 领导团队任职五年，在 OpenAI 发展成为全球人工智能领导者的过程中发挥了关键作用', '\\n有参加了此次交流会的开发者表示，因为这是闭门交流会，所以 Altman 在交谈中表现出了开放的心态，讨论内容既涉及开发者面临的实际问题，也延伸到了商业竞争、AI 监管和开源等问题', '有人指出，GPT 商店和收入分享是 Sam Altman 做的单方面商业决定，追求利润，而非一个负责任的非营利组织原则，威胁到了企业的核心价值', '\\n与董事会之间存在矛盾，是否早有端倪？\\n今年 6 月份，OpenAI 创始人 Sam Altman 与 Humanloop CEO Raza Habib 以及其他 20 位开发者面对面进行了一场闭门交流，交流中他们讨论了 OpenAI 的近况与未来的规划', '另一联合创始人兼总裁 Greg Brockman ，卸任董事会主席', '刚刚，OpenAI通过其官网宣布任命其首席技术官 Mira Murati 为临时首席执行官，现任 CEO、被誉为“ChatGPT 之父”的Sam Altman将辞去首席执行官职务并离开董事会', '\\n作为此次过渡的一部分，格雷格·布罗克曼 (Greg Brockman) 将辞去董事会主席职务，并继续担任公司职务，向首席执行官汇报']]}\n",
      "Sam Altman被开除的原因是他在与董事会的沟通中不坦诚，阻碍了董事会履行职责的能力。\n"
     ]
    }
   ],
   "source": [
    "clear_history()\n",
    "\n",
    "question = \"Sam Altman为什么被开除？\"\n",
    "similar = get_similarest_result(query_from_vec_db(question))\n",
    "prompt = f\"\"\"\n",
    "对下面的信息做总结\n",
    "```\n",
    "Q:{question}\n",
    "A:{similar}\n",
    "```\n",
    "\"\"\"\n",
    "user_message(prompt)\n",
    "response = predict()\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{\"role\": \"user\", \"content\": \"\\n对下面的信息做总结\\n```\\nQ:Sam Altman为什么被开除？\\nA:\\n关于离职原因，OpenAI 方面给出的理由是：Sam Altman 先生的离职是在董事会经过审议后得出的结论，他在与董事会的沟通中始终不坦诚，阻碍了董事会履行职责的能力\\n```\\n\"}, {\"role\": \"assistant\", \"content\": \"Sam Altman被开除的原因是他在与董事会的沟通中不坦诚，阻碍了董事会履行职责的能力。\"}]\n"
     ]
    }
   ],
   "source": [
    "dump_chat_history()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#大模型数据新鲜度低的问题\n",
    "* 导入文本，chatpdf\n",
    "* 使用搜索引擎，外部工具，API\n",
    "* 向量数据库\n",
    "下一期看看langchain"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
